Enough with the ratings, now merge all new data columns created
They are:

  1. "tot_labour_cost($)" from labour df
  2. all columns from food df. For merging to work smoothly, update the column name food_id to simply _id. Ignore nut_unit rows at this point

Serious questions to ask the data

Still doing EDA...

Notes from EDA

Numarical features

Weighted mean rating, wmr

Servings

CALORIES PER SERVING

PROTIEN

CARBS

FAT

SODIUM

TOTAL LABOUR COST

CATEGORICAL FEATURES

Keys to encoded data: other than Sodium levels 0 --> high 1 --> low 2 --> normal

for soduim levels 0 --> bad 1--> good 2 --> normal So, generally high values suggest normal values.

CARB levels

PROTIEN levels

FAT levels

SODIUM levels

OVERALL FOOD QUALITY

Final thoughts

I scrapped mainly chicken dishes, and appitizers, smoothies, snacks etc. The chiken recipes are not that many. This may have been reflected in the histograms of the major nutirents (carb, fat, protien, and even salt) and calories. It would be interesting how the ML classifiers pick out this an unrepresentative sampling issue in the data. Again, this is not a real research. We are just learning how things work ...

we could try to see several of the ffeatures, but there are too many combinations even if we want to see a 5D scatter plots </br>

from itertools import combinations
  comb = list(combinations(list(final_encoded_scaled__clustered_df.index), 3))
  len(comb)

A lot to unpack from the Agglomerative Clustering result above. For a star we can say the following

A lot to unpack from the KMeans Clustering result above, too.